Skip to content

Fix geo_centroid over geo_shape merging multiple shards#144637

Open
craigtaverner wants to merge 13 commits intoelastic:mainfrom
craigtaverner:fix_centroid_shards
Open

Fix geo_centroid over geo_shape merging multiple shards#144637
craigtaverner wants to merge 13 commits intoelastic:mainfrom
craigtaverner:fix_centroid_shards

Conversation

@craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Mar 20, 2026

As described in #144504 we are seeing different results between geo_centroid in QueryDSL and ES|QL's ST_CENTROID_AGG when the data is geo_shape (or cartesian shapes). The root cause turned out to be that geo_centriod was using point-only logic for merging intermediate aggregation results between shards, which means it was considering doc-count only for the weighted merge, and missing:

  • Area weights (large polygons count more than small ones, and long lines count more than short ones)
  • Dimensional type (shards with only points should not count at all if other shards contain lines or polygons)

This PR contains appropriate unit tests, but manual testing on the OSM dataset we used in the benchmarking leads to:

test ESQL doc count ESQL centroid Query DSL doc count QueryDSL centroid
original 3818897 POINT (0.7135620032440881 54.048676344286505) 3818897 {lat: 52.91002138728532, lon: 4.350566187678227}
after area-weight fix 3818897 POINT (0.7135620032440897 54.04867634428662) 3818897 {lat: 51.63239198052063, lon: 8.3761283280999}
after dimensional-type fix 3818897 POINT (0.7135620032440892 54.04867634428658) 2017758 {lat: 54.048676344287394, lon: 0.7135620032440999 }

Note: the small deviations on the 15th and 16th decimal point of the ES|QL results actually occur when simply re-running the same query over and over and are likely a result of floating point errors related to the order in which results are added to the Kahan summation and are not related to the fixes.

Fixes #144504

@craigtaverner craigtaverner added >bug :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 20, 2026
@craigtaverner craigtaverner requested a review from iverase March 20, 2026 11:23
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

@iverase
Copy link
Contributor

iverase commented Mar 20, 2026

I would expect this to have some BWC side effects?

protected final double firstWeightedSum;
protected final double secondWeightedSum;
protected final double totalWeight;
protected final DimensionalShapeType shapeType;
Copy link
Contributor

@iverase iverase Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this as a record that is null (or a constant) for points? I want to make the canonical footprint of this object smaller as we can be thousands of those objects on heap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I've done that now, with a record that we set to null for points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ST_CENTROID_AGG different results than QueryDSL over shapes

3 participants